Semi-supervised eigenvectors for large-scale locally-biased learning

نویسندگان

  • Toke Jansen Hansen
  • Michael W. Mahoney
چکیده

In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby” that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified “seed set” of nodes, or one might be interested in finding partitions in an image that are near a prespecified “ground truth” set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semisupervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes that is assumed to be provided in a semi-supervised manner. We show that these semi-supervised eigenvectors can be computed quickly as the solution to a system of linear equations; and we also describe several variants of our basic method that have improved scaling properties. We provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph Laplacian.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Eigenvectors for Locally-biased Learning

In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby” that pre-specified target region. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analys...

متن کامل

Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data

We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digi...

متن کامل

Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels

Semi-supervised kernel design is an essential step for obtaining good predictive performance in semi-supervised learning tasks. In the current literatures, a large family of algorithms builds the new kernel by using the weighted average of predefined base kernels. While optimal weighting schemes have been studied extensively, the choice of base kernels received much less attention. Many methods...

متن کامل

Robust Image Analysis by L1-Norm Semi-supervised Learning

This paper presents a novel L1-norm semisupervised learning algorithm for robust image analysis by giving new L1-norm formulation of Laplacian regularization which is the key step of graph-based semi-supervised learning. Since our L1-norm Laplacian regularization is defined directly over the eigenvectors of the normalized Laplacian matrix, we successfully formulate semi-supervised learning as a...

متن کامل

Mapping the Similarities of Spectra: Global and Locally-biased Approaches to Sdss Galaxies

We present a novel approach to studying the diversity of galaxies. It is based on a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors. Our method introduces new coordinates that summarize an entire spectrum, similar to but going well beyond the widely used Principal Component Analysis (PCA). Unlike PCA, however, this technique does not assume that the Euclidean...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2014